Hyper - Systolic Implementation of BLAS - 3 Routines on the APE 100 / Quadrics

نویسندگان

  • Marco Coletta
  • Thomas Lippert
  • Paolo Palazzari
چکیده

Basic Linear Algebra Subroutines (BLAS-3) 1] are building blocks to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their eecient implementation on a given parallel machine is a key issue for the maximal exploitation of the system's computational power. In this work we refer to a massively parallel processing SIMD machine (the APE100/Quadrics 2]) and to the adoption of the hyper-systolic method 3, 6, 4] to eeciently implement BLAS-3 on such a machine. The results we achieved (nearly 60-70% of the peak performances for large matrices) demonstrate the validity of the proposed approach. The work is structured as follows: section 1 is devoted to review BLAS-3, in section 2 we recall the hyper-systolic method, subsequently (section 3), the target machine is described and (section 4) the HS implementation is shown. Finally (section 5), some experimental results are given.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hyper-Systolic Implementation of BLAS-3 Routines on the APE100/Quadrics Machine

Basic Linear Algebra Subroutines (BLAS-3) [Cho 92] are the building block to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their efficient implementation on a given parallel machine is a key issue for the maximal exploitation of the system computational power. In this work we refer to a massively parallel processing SIMD machin...

متن کامل

Hyper-Systolic Implementation of BLAS-3 Routines in the APE100/Quadrics Machine

Basic Linear Algebra Subroutines (BLAS-3) [Cho 92] are the building block to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their efficient implementation on a given parallel machine is a key issue for the maximal exploitation of the system computational power. In this work we refer to a massively parallel processing SIMD machin...

متن کامل

Hyper-systolic algorithms for N-body computations and parallel level-3 BLAS libraries

Hyper-systolic algorithms repesent a new class of parallel computing structures. Because of their regular communication and compute patterns they are well suited for implementation on most parallel architectures, in particular, high performance SIMD machines can beneet considerably. After a short explanation of the concept of hyper-systolic algorithms, their application to N-body computations a...

متن کامل

BLASFEO: Basic linear algebra subroutines for embedded optimization

BLASFEO is a dense linear algebra library providing high-performance implementations of BLASand LAPACK-like routines for use in embedded optimization. A key difference with respect to existing high-performance implementations of BLAS is that the computational performance is optimized for small to medium scale matrices, i.e., for sizes up to a few hundred. BLASFEO comes with three different impl...

متن کامل

Level and BLAS in the NAG C Library

This report describes a set of matrix vector routines Level BLAS and matrix matrix routines Level BLAS written in C These routines have been included in Mark of the NAG C Library and are used by other library routines in that library Details are given of the implementation testing and use of the routines and a complete listing of all the ANSI C function prototypes is included in the Appendix Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998